'docombht resohe 



ED 127 3^5 " \ T« 005 434 




A0THO5 ^Sanders, James R. ; Nafziger^ Dean H. 

TITLZ * A Basis §or^ De termining the Adequacy of Evaluation 

Designs. 

IHSTITllTICN Northwest Regional Educational . lat. r Portland^. 

Ojreg^ 

SPCKS AGS-NCY Alaska State Dept. of Education^ Juneau. 
>UB DATE Oct 75 

MOTE 57p.; For related documents^ see TM 005 430 and 

431 . / 

/ ' 

ED?S PRICE 7 «F-$0.83 HCA$3. 50 Plus Postage. 

DESCRIPTCPS / *Check Lists; Criteria; Data Collection; Decision 
O / Making; *lfvaluati'on; *Evaluation Criteria; 

*Evaluati:on Methods; Evaluation Needs; Guidelines; 
Informa/icn Utilization; *Prog'ram Evaluation; Program 
Planning; Standards 

ABSTRACT 

A ba4is is provided for judging the adequacy of 
evalua,tion plans oy evaluation designs in this document. It is ^ 
assumed t^at using the procedures suggested to determine the adequacy 
of -evaluation designs in advance of actually conducting evaluations 
will lead to bet/ter evalua^tion designs, better evaluations, and more 
useful evaluative information. The paper is divided into four general 
sections. First, some basic questions are considered—Why evaluate? 
Why do we neaja evaluation designs? Why do we need a basis for judging 
the adequacy/'^f an evaluation design? Answers to these questions 
serve to underscore, the importance of providing a consistent basis 
for judginq^ ^'valuation designs. Secodd, a checklist cf .basic 
consider.ations important in judging evaluation designs is presented. 
Each comt/onent of that checklist is briefly discussed. Third, a 
sample design is presented, together with an example cf how the 
checklist can be used 'in judging an evaluation design. Folirth, noted 
professional educators* thoughts about judging the adequa^cy of — - 
evalu^^ion designs are presented. This fourth section is intended 
espe^ally for the reader who would like additional background based 
upon current literature in the fisld. (Author/DEP) 



\ 



**** * 5* ****** *********** 

* \ Documents acquired by ERIC include many informal unpublished * 
^ materials not available from ether sources. ERIC makes every effor-d * 

* tO\ obtain the best copy available. Nevertheless, items of marginal * 

* re t)ircducibility ^aie often encountered and this affects the quality ^ * 

* of^\he microfiche and hardcopy reproductions ERIC makes available ' * 

* via phe ERIC Document Reproduction Service (EDRS) . EDRS is not ^ * \ 
^ respdinsible for the quality of the original document. Reproductions * ; 

* suoplied by EDRS are the best that can be made frcn the original; * ' \ 
4c ***i*:fc** *****************************************.* 3^44 *********** ****** * 



ERIC 



^ 



A BASIS FOR DETERMINING THE ADEQUACY Of 
EVALUATION DESIGNS 



James R. Sanders 
Dean Nafziger 



Northwest Regional Educational Laboratory 



October 1975 



us OePARTMENTOPMEAtTM 
COUCATlON i WEtPARE 
NAflONALtNSTlTUTE OP 
EOUCATtON 

'M , Do^ .vts' HA> BEEN OFPS^O 
. F o F <A' V ♦ A. :»ECe c«CV 

CO 'A' OS • 0^ ^» 



Prepared under contract support from the Alaska Department of Education 



A B'A^IS FOR DETERMINING THE ADEQUACY OF 
EVALUATION DESIGNS 

In recent years, the educational coimunity has widely acknowledged 
the usefulness of evaluation in providing information about educational 
programs, policies, and curricula; as a result, evaluation studies are 
present]^ an expected-'-and often mandated— part of most educational pro.- 
grams. At the same time, many evaluation studies fail dismally in their 
fuission of providing helpful and critical decision-making informatfon. 
Too' often such failure is attributable to poor prior planning. 

The purpose pf this paper is to provide a basis for judging the 
adequacy of evaluation plans or, as they are commonly cal led> -Syaluation 
designs^ The authors assume that using the procedures suggested in this 
paper to determine the adequacy of evaluation designs in advance of 
actdcflly conducting evaluations will lead to better evaluation designs, 
better evaluations, and more useful evaluative information. ^ 

To assist theVeader,. the paper has been divided into four general 

» 

sections. Readers are encouraged to concentrate on those sections that 
seem most appropriate for their needs. 

First, some basic questions are considered— Why evaluate?* Why do 
we need evaluation designs? Why do we need a bas4s for judging the 'ade- 
quacy of an evaluation design? Answers to these questions should serve 
to underscore the importance of providing a consistent basis for judging 
evaluation designs. 

Second, a checklist of basic considerations important in judging 
evaluation designs is presented. Each component of that checklist is 
briefly discussed within this section, £^ 

Third, a sample design is presented, together with an example of 



how the checklist can be used in judging an evaluation design. 

' -Fourth, noted professional educators' thoughts about judging the 
adequacy of evaluation designs are presented. This fourth section is 
intended especially for the reader who would like additional background 
based upon current literature in the field, ^ 

We anticipate that the primary audience for this paper will he /'^ 
Alaskan educators and educational administrators— particularly project 
directors and evaluators— who have to deal with evaluations frequently. 
The paper is not written for a highly technfc^l audience; the authors 
recognize that many Alaskan educators—like educators everywhere— 
have not had time^to devote to the detailed study of measurenient and 
'Statistics- Therefore, in the interest of making the paper useful to 
the widest possible readership, the criteria presented for judging de- 
signs rely on concepts that are easily communicated or cWonly known 
to.educators. Technical or otherwise esoteric concepts are deliberately 
omittted; ' ^ 

Information contained in this paper can- be used in two ways. First, 
1t can be used by evaluators as a guide in prepariiig— and later review- 
ing and improving— their own evaluation designs. Second, project di- 
rectors can use the checklist to judge the adequacy of evaluation designs 
subhiitted to them. Special coirmuni cation needs often arise between-an 
evaluator and project director; evaluation designs can facilitate clear 
communication, and serve as a standard to assure quality evaluation. An 
evaluation design provides a written record of decisions about the eval- 
uation to which both the evaluator and project director can refer. 



1. BASIC QUESTIONS REGARDING- EVALUATION 



As a -preface to the checklist of criteria , for determining the ade- 
quacy of evaluation designs, a >f ew basic questions relating to evalua- 
tion are briefly addressed in this section. Answers to these questions 
amplify the assumptions and rationale underlying this paperJ 

J 

Why Evaluate? 

Evaluation gives information about the quality of educational pro- 
grams provided to our children. Withou"^ it, we could not know whether 
a curriculum was effective, whether a student was performing satisfactor- 
ily, or Whether the dollars earmarked for education were being -spent "well . 

Given the benefits it provides, proper evaluation is an essential 
part of all education. Those benefits may include" the following: 
1. Identification of strengths and weaknessess— a 

first step toward improvement. . * . - " 

■ 2. Detection of problems before correction becomes dif- 
ficult or impossible. 
^ 3. Identification of needs that should be addressed through 
educational action. 
4. Identification of human and other resources, that can be 
used effectively in education. 



To. meet the anticipated needs of the audience for 'this paper, dis- 
cussion of thfc questions is abbreviated. For a more complete explication of 
some of these questions and others (e.g., When should evaluation be done? 
When should an external rather than an internal evaluation be used?), see 
Wright, W.J., '& Worthen, B.R. , Standards and Procedures for Development and 
Implementation of an Evaluation Contract . A discussion paper prepared for th 
"Alaska Department of Education, October 15, 1975. 

3 



5. .Docurtientation of desired outcomes of pducatjon, 

6. Information useful in educational ptJtflfmig and ^ 
decision making, 

7. Cost information that can ultimately reduce edu- 
cational ex[fense. 

Why Do We Need Evaluation Designs? 

Everyone imp-lilcitly engages in evaluation virtually every day of his 

i 

life. When buyirjg a new coat or choosing a restaurant we make decisions 
based on our evaluation of the quality of the available choices. These 
evaluations are .often informal and are seldom planned in terms of pro- 
cedures and outcomes. Given tine constraints and the relative low penal- 
ties for -making errors, such informaT evaluations are entirely appropriate, 
^ However, when the choices or courses^of action affe'ct students, result 
in expendi tunes of scarce public funds, or involve long term commitments 
or benefits, the situation is different. 

Carefully planned evaluation procedures, which are referred to in this 
document as designs , help both 'the project director and the evaluator 
understand the process through Which a program or project will be judged* 
The design also provides for the organization of resources and activities 
which are required for an evaluation study,.. ^ ^ 

Preparation and use^of an evaluation design has benefits for both the 
evaluator and the"" project director. Presenting an evaluation design 
gives the evaluator an opportunity to communicate with. project staff 
concerning proposed evaluation procedures and ensure their clear under- 
standing of the process. At this* point changes can be made without dis- 
rupting the evaluation. For the project director and staff, an evaluation 
design provides an opportunity to review the type of information to be 

... 6 



■ ■ v 

\ 



yielded by the evaluation so thaj: adaitionaV oK alternatTve types of 
data collection can be suggested if necessary to provide complete infor- 
mation to dll users of the evaluation results. Also, evaluation procedures 
can be reviewed in order to ensure that no unexpected disruptions of the 
program will occur. Many misunderstandings have occurred between eval- 
uator and project staff, and many an evaluation has altered in focus be- 
cause a clear, systematic evaluation design was not prepared early 'in tbe 
evaluation. 

The advantages to completing an evaluation design early include the 
following:' 

1. Assuring clear and accurate direction for the ^^tudy 
by establishing the usesj for evaluation results, 

and by specifying expected products of the evaluation. 

2. Assuring completeness of procedures by giving others 

an opportunity to make suggestions* *^\, 

3. Identifying i neons'! s ten cies in perceptions by the eval-^ 
uator and project di^rector of evaluation plans so that 
these can be resolved prior to actual evaluation. 

4. Providing a clearly defined set of tasks for the evalua- 
tion so that attention is maintained on important outcpmes* 

5. Assuring efficiency in the evaluation by organizing re- 
sources and activities. (Like any substantial educational 
undertaking, evaluation ^requires good, management and ac- 

» " counting.) 

In short, evaluation design helps the evaluator and project director 
communicate clearly about , the project. Because of the importance of the 
-design, it is critical that it be closely scrutinized and all details dis- 

7 . 



tussed. Specific criteria or guidelines are particularly helpful to 

clients in critically reviewing a'Niesign., 

) 

* I 

Why Do We Need A Basis for Judging the Adequacy of an Evaluation Design? 

Most" school administrators have* few, if any, persons on their staffs 
vnth sufficient draining and experience in evaluation to judge the ade- 
quacy of evaluation designs solely on the basis of their own knowledge. 
In addition, qualified persons are in such" demand that they are often un- 
able^ spend the^time necessary to personally, review al 1 "evaluation 
designs used in the system- Therefore, administrators and other educat- 
ors arfe often left with little or no help in determining whether designs 
proposed for evaluations of their programs are sound and capable of 
providing useful information" about those programs. Given this situa-** 
tion, ther^ is a need for written guidelines which might serve as a basis 
for judging an evaluation design. Several benefits are expected to accrue 
from the use of such guidelines: 

1, The guidelines should improve the quality, of evaluation. 
Established guidelines should represent what i,s known about 
producing useful, technically correct evaluations, and their 
use should therefore preclude many errors common to evalua- 
tion studies. 

2- The guidelines should provide a frame'5/ork for developing e.val 
uation designs. Established guidelines clarify and make 

* 

public the expectations about what a good evaluation design 
ought to include. Because they aid communication in this 
way, guidelines can be used as a basis for designing evalua- 
tions • . 

6 



3. The auidelines should assist administrators in monitor- 
ing evaluation worH. The use of guidelines ehsures that 
important aspects of an evaluation will be described in'*?hef 

) - ^ . - Sr 

/ design, and that descriptions, will be specific enough to 

/ ^'"^ ) ' -'^ ' * 

y assist in monitoring the- evaluation sti/dy, ' t 

4, The guidelines can help address ethical considerations in 
contract evaluation work. Established gu^idelines help guar- 

\ antee that aspects of the evaluation which are subject. to 

questions of ethics— such as reporting procedures, infor- 
mation release and dissemination policies— will ^e considered, 
and relevant issues resolved prior to the evaluation study, 
JKis in turn-helps prevent inappropriate use of the evalcft- . 
tion results. ^ , - . 

Ethical conduct in educational evaluation is a critical issue v^hich 

> 

pervades much of the current literature on evaluation. Unfortunately the 
scope of this paper does not permit an adequate discussion of the topic- 
A comprehensive treatment of ethical standards and conduct, while in 
order must await another document devoted specifically to that issue. 



9 



7 



♦ 



II. A G«£C;4/ST FOR JUDGING THE ADEQUACY OF EVALUATION DESIGNS. 



Virtually everyone -involved in any way .with eya^?>uation iSs concerned ^ 

■ • ' ■ " ^. .V. 

with the quality of the evaluation effort* The check^j^t presented on 

■ ' •: • ^ 

the following pages provides a. ba.sis for judging the adequacy, of evaluation 
designs. The checklist is divided i/nto fbur general sections, each of 
wffich covers several criteria regard-ing evaluation designs. Those 
criteria are addressed through aHet of related questii^ohs- All criteria 
are more thoroughly discussed following the presentation of the check- ' 
list. . f 

Briefly, the four general sections are as follows. The first section 
includes Criteria concerning the adequacy of evaluation planning which 
covers such issues as whether the proposed evaluation addresses all im- 
portant aspects of the program^ and whether the evaluation can be completed 
within existing constraints. 

The second section includes Criteria concerning the adequacy of the 
collection of and processing of information . These questions cover the 
reliability, objectivity, and representativeness of the information ob- 
tained. 

The third section. Criteria concerning the adequacy of the presenta- 
tion and reporting of information , deals with the usefulness and ^complete- 
ness of the anticipated reports, / 

, The fourth section includes General Criteria , those which deal with 

/ 

/ 
/ 



ethical considerations and protocol 



8 



10 



u) o o 
<d\ o 

c c 

•f- J- a, 

^ dj X 

* i- U (D 

u sz 

0) J= Si. 

















> 






:>> p 




CO 






u; 






o 


v.cD a 








s: 






o 






N >— 1 


c: 










< 












—J 


^* 0 




<: 


'c — u. 




> 








.^^(D 0 




2: 












u. 


C t. 




0 


0 0 






••r- +^ ^ 




>- 


CD 








> 




=3 LU 










cr 


(O r- <D 




UJ 


> J- -C 




Q 






<C 


a 




UJ« 


x: u cn 






















-C -IJ "O 


• 


1—4 


. +J 0 CU "O 




CD c :s 


a; 


Q 


x: c: OJ 








u 
















r- i. S- 


u » 


0 


. t> 0 




u. 




c 




•r- C 


a; 


J- 




a; 




0 dJ ^ 
















0 c 


<c 


0 


r— c ox: 


Ui 


^ 0) Q .f- 


















c: 0) "o 


03 




0 






•r- C 






-M S- 0 


0 




10 CJ 






0) -M 


^1 




3 dJ 03 




cr-c 3 






4-> r— 


03 



oj O) ^ 



C 

O) 
B 

U 

a 

c 
o 



0) 




0 










dj x: 


•(-> 










03 




0 










u. 


c 


c 




dJ 






0 0 








>N +^ 'f- 


> 


> 








UJ 


dJ 


c 




>) 




Of 


0 
















0 












0 


U 


u 


ro U 


<D 












dJ 








<M 








0 0 




to 


Q 




C ^ 




2? 



OS 

u 
c 

03- 

LU 



d) 



J 




XI'C 
+J O 

O 03 
N 

03 ^0 

a 3 
cr-M 
dj ou 
^ dJ 
<C u 
c 

<D O 

x: o 

c 

cn o 

C -r- 
•r- +J 
"O 03 

t- 3 

03 r- 
03 

dJ > 



to 



dj 
>- 



>- 



dJ 
>- 



dJ 
3 



X) 

-a 

o 

> 

dj o 
cn 

C Q. 

03 

dj 

dJ 

x: o 



CO C 
(J 

dj ja 

a. 

(/) 

03 u 

3 

c o 

03 t- 
•r- 

•I- o 



x: 



djtjaj 
Si. o dj 

Cu U S- 
14- 3 

u >i dj 
c 3 x: u 
o TJ Ou o 
•r- O O S- 

^ (/) CL 
0.0.0 



o 

CO 

dj 

o 
u 

3 
O 

•a 
dj 
•a 
c 
d; 



4-^ O) d) rO 



u 
dj 



to 



OJ dJ O -M 

E x: 03 



to 



i- O. 3 »-H 
O 

^ d) 03 
C r- J= > . 
r- fO +J O) r- 



ox: dj T- 

o. ^ 

E T- cn dj 

03 -^-J^ c x: 
4. • u 

o> O) dj -P 

^ dj .0 dj s~ 



dJ E 
x: dj 
u -M x: 

3 -M 
"O to 

o d) to 
i- o to 
(D-^o dJ 

i- -o -o 

O C t3 
03 03 

E 

03 * C 
i- -o O 
C) d) •r- 
O 

t- M- 03 
Out- 3 
U 1— 

<D <D 
JC O. > 

to d) 

/ 



o 

ro O 
03 



•o 

(1) 
C 

dj cn 
+j o 

c t- 0- c 

•r- 0.^3 O 
3 dJ t- -M 

x: dj 03 
>> . J -o g 

dJ E (/) O 
Jkl O CM- 
•r- t- O C 

>^ to -M +J 
C 4J U to 
03 V 3 O 
d) U 

< d) o.>-4 



u 

3 
O 
O. 

q 

03 

o dj 
I- -o 

O. C3 

a 



11 



4 



< < < 



o 6 



</> Ul </> 

>->->• 




o 



>• >- 




2 




13 



11 



t 



c 
o 

o 

jQ 
(d 
f— 

UJ 



c 
o 



CO 00 

>- >■ 



c 
o 



o 



C r3 
(U -M 
OJ <a 

jQ -O. ' C 
O 

+-> -f- ^ 

to O 

CO 

OJ C 

o 

Q. O 



OJ 



cr 

S to 
(d 

OJ o 
> M- 

<t3 



4^ 

O 



O T3 CO 

C (U 

(O O 

C 

O C 3 

u o o 

•r- CO 
O -M 

4-> (J OJ 

<d o • 



CO 

• dJ 

"a xo 

OJ (d 

•r- "r- 

(J dJ 
OJ r— 

a.jQ 

CO T- 

co 

t/) 

r- O 

^ a. 
OJ dj 
o < 



00' 

O 

•M OJ 

(J ^ 
t— o 

o ^ 
(J c 

o 

(t3 >^ 

dJ 
4->' 
O 03 

+j cr 
«o -a 



CO 

dJ 
>• 



CO 

dJ 
>• 



CO 

dJ 
>- 



dJ 

rt3 



dJ 



o 



x: 
dJ 

CO O 

C Q. 

dJ 

CO 

dJ dJ 

3 CO 

-a s- 

dJ 3 

u u 

o u 

s- CO 

CL 

C r- 

•r- 3 

CO CO 

CO dJ 

dJ 

u 

o dJ 



c 
o 

o dJ 
3 f— -a 
-a f— T- 

p Orr 
S- O 03 
Q. > 
rO 

o <o 

-a c 

E dJ 

(O OJ E 

s- -C 3 

o 

Si. dJ CO 
OL c 
<: -r- 

dJ 



c dJ 
o +-> 

•f- (O 

-M T- CO 

dj OLx: 
f- o 

O Q.*4- 
O OL o 
CO 

«T3 

(O «!-> 

-a c 

dJ 
dJ E 

-C 3 



CO 

dJ 

CO c 

o o 

3 <0 



^ O. 3 
+-> f— 

dJ CO dJ (O 

<: .r- 4J dJ 



c 

o> "a 

•r- dJ 

CO 

dJ dJ c 

c c 

Q (/I :r- 
•r- to « 

-M dJ (n 
(O s- 
3 T3 

r- -a 

(O <o +J 
> 'r- 
dj >^ c-' 

dj dJ c dJ 

4-> O S 

CO 

-P c 
tn cr <o 
dJ dJ dJ 
o T3 3 o 
Q (O cr 4J 



dJ 
>- 



CO 

>• 



03 

5 



cn 
>^ c 

O .r- 
03 4-> 
3 J- 

cr o 
dJ ol 
-a dJ 

< CSL 

dJ x> 
x: c 

cn c" 
c o 

•r- 'r* 
C 4-> 

dJ 4J 

o c 
c dJ 
o to 

O O) 

i. -r- 
03 CL» +J 
•r- 03 
S~ <D £ 

dJ J= C 

4-> 4J O 
•r- H-v 

O O »-H 



c 
o 



o 



4J 
CO 



dJ dJ 

o 

O 

cn 

3 CO 
O dJ 

c o 

dJ c 

dJ 

^•^ 

dJ 3 
e 03 

dJ 
j:: 

dJ 

-o o 



o 

dJ 
3 

-a 

dJ 

o 
dJ 



c x: 
o 

(tS dJ 



> 

O dJ 
U CO 

a. 3 



CO 

dJ 
dJ 
C 

dJ 

x: 



4J 

dJ to 

e u 
c 

O) <v 

•r- T3 
4J 3 
i. (O 
O 

CL'^dJ 
dJ JZ 
U 4J 



dJ < 



o 



to 
c 
o 



o 



3 dJ 
-a -M 

dJ (O 
x: -r- 
o s- 

o -a 

•r- O. O 

4-> (O 
i. dJ 
O dJ r- 
CLX) 3 

dJ -a 

J- O dJ 

4->x: 
dJ o 
C (/) 

O dJ 

w x: x: 
to.+-> 



> • 



TRLC 



IT 




O 

O OJ 
^- O -Q 

C J= ' 
.1- 3 O 
+J 

00 I— 

ITS O 
O -M 

to 4-> in 

a; "O C 

^ -o ^ 
> c 

o 



to 
> 

Ql. :a f— 



OJ 
E 

r— *~ 

f— to C7> 
(13 <U C 

O C (U 
4-> OJ ^ 
•r* 

<U ZJ 4-) 

4-) (O Q. 

(13 S 

C OJ 

•1- OJ 4-) 

S 4-) 

OJ C (13 

00 OJ 

t/) -M OJ 

•a < 



4-> O 
C OJ * 
I (13 C 

> T- O 

O OJ 



C OJ 

C O 
O 4-) 

4-> 

CO f— 
=3 ^ 
f— (13 
CO f— 
> 

(U (13 
> 
OJ (13 

o 

^ 4-) 

3 £ 



OJ (13 

O f- 

J= (13 
4-> > 
OJ 

C OJ 

o x: 

>>4-) 

to 

C 4-) 
OJ U 
•r- <U 

(O to 



OJ 

c 

OJ 
C5 





r~» 








(T3 








U 






to 








<u TJ x: 






o 


=3 4-) 


















c 




•o 




o 




C OJ 




•r* 


to 


O 4-) 




4J 


c 


•r- Q. 






o 


•M OJ 




s.. 




(O O 




4-» 




=3 O 




to 


(13 






'rr 


S. 






c 


OJ 


> 3: 




•r* 


"O 


<D O 




£ 


•r- 








(/) 


r- 






C 


CD O 






o 


<^ 




4-» 


o 




to 


tn 






OJ 




4-J r- 




4-» 


(T3 


C 4-) 






U 




O 


•r- 




C Q 


x: 










-C 4-) 




• 


UJ 


4-J to 


to 





i/i 
3: c 

r- t/1 
O 



<U O t/^ 

3 tn T- 
■o tn jC 

OJ OJ 

o o 

Q. CL O 



15 



c c 

(13 <U 

x: 
to 

<+- <u 
o c 

•r- 
C r— 

O <U 
•r- TJ 

O =3 

► <u ax 

O t/' 
4-) (U 

o. o ^ 
OJ o 
<U •f-)r- 
> -Q r- 
(O =3 O 
□C to 



(T3 
4-> 

O 



(13 "O 

4-> <D 
C +J 
OJ C 

rt3 

r3 
oi 

c: 

(D 



CO 



<u 

4-> 

i--o 

O. OJ 

o c 

Our- 
(O CL 

CL 
< 0) 
4-) 



CD 



o 

CLf 



<u 

4-» 
(T3 
•r- 

(/I CL 

c o 
o 

i/) CL 
$- CL 
<D (T3 
O. 

OJ 
OJ jC 

(13 

•r- C 
$- 'f- 
O. 

CTJ 

0.4-) 
Q. O 

as 

4-> 
OJ c 

o 

< o 



C • 

(T3 "O 

to 3 
O) O 

O 

CL <U 
4J 

C O 
OJ 4-> 

£ 

+J to 
$- <u 

OJ (O 
O CL 3 
C <U -D 
OJ O OJ 

cr OJ 

OJ 



OJ 



u 
o 

a. 



Use of the Checklist 

The checklist should be used like any other set of guidelines. Once 
the design has been read thoroughly, each item on the checklist should he. 
considered with respect to the design. For each question related to 
the criteria, one of the four available options--Y6s , No,. ?, Not Appli- 
cable (NA)--should be circled^ depending on whether the criterion was 
adequately met. 

.Each question should be clearly and fully addressed by the evaluation 
design. If that is the case and if the requirements of the question are 
met, the reviewer should circle "Yes." For any question which is not 
discussed or the requirements of the question not met, the revl^ewer 
should. circle "No*" If for some reason— such as inadequate information— 
it cannot be determined whether the question is appropriately answered, 
the reviewer should circle "?." If a question is not applicable to a 
particular evaluation, .the reviewer should circle "NA." 

, In the space marked "Elaboration" the reviewer should note any 
additional comments that ought to be transmitted to the author of the 
evaluation design. In particular, if a criterion was not met or if there 
;<^as some question about its being met, elaboration would be warranted. • 
Further, ambiguous intentions or plans seeming to require revision should 
all be noted in the "Elaboration" section. Upon completing the check- 
list, it should be given to the evaluator and to others'^affected by the 
evaluation so that it can be used to revise the evaluation design. 

There will likely be^ instances in which the reviewer will want to' 
obtain advice from another person about whether a question has .been appro- 
priately answered. For example, this might occur when judging infor- 
mation about the validity of a test or about the appropriateness of a 
data collection design. The user of the checklist should always seek 

: 10 



and obtain advice when the content of an evaluation design or items on 
the checklist prevent him from making a judgement. 

It is important to remember that an evaluation design is a vehicle 
for communication between an evaluator and those whose role. calls for 
reviewing the evaluation plan. The checklist helps organize that com- 
munication. In cases where an evaluation is conducted by a contractor, 
the design becomes a vehicle for a communicator between the evaluator 
and the client. In such cases the checklist assists a client in judging 
ajlequacy of the design, and provides a basis for giving feedback to the 
evalua^tor. If the evaluator is involved in the program being ev-^luated", 
the guidelines provide a basis for the evaluator and his or her colleagues 
to check th^ design, 

itr^Each major point of consideration noted in the checklist is reviewed 
in the next few pages, along with information that should be covered in 
evaluation design. 

CRITERIA miCERNING THE ADEQUACY OF EVALUATION PLANNING 

A. Scope . The evaluation design should include plans to col- 
lect information about all significant aspects of the program, 
< product, or process being evaluated, If a student's perfor- 

mance is being evaluated, and the evaluation design does not 
call for collecting '"information about conditions that might 
adversely affect his or her performance, that oversight should 
be noted. The primary concern within thi5 criterion is whether 
the focus^ of the evaluator's attention is too narrow, " 
3. Relevance. The'design should include plans to collect infor- 
mation that addresses the. concerns of those who requested the 
evaluation. For example, if a compensatory education project 

17 ' , 



is being evaluated and the project director is concerned' about 
upgrading the reading skills of children in the program, the 
evaluation design should call for collecting information about 
improvement in children's reading skills. To make the design 
relevant to the needs of the evaluation audiences, the e.valuator 
should indicate the various audiences that need information and. 
give the expected uses of th,e information. Any suggestions or 
changes concerning the information to be collected should be 
noted. 

C. Flexibility . The evaluation design should be open enough to \ 
allow for the addition of new information gathering and processf^ig 
activities. This is especially important in complex, long term 
program evaluations where changes in program plans are likely>'x If 
a new program directed toward changing the attitudes of minority 
children toward school is just ge^tting underway and the evaluation 
design does not allow for changes in instrumentation resulting 
from changes in program objectives, it should be note^d that the 
criterion is not met, and suggested means -of allowing for such 
change should be given. 

Feasibi lity . The evaluation design should provide enough informa- 
tion so that the feasibility o^ carrying out the study can be det- 
ermined. Many evaluation' designs fail^to meet this criterion. 
Feasibility can be deterniined'on the basis of schedules, budget, 
personnel assigned to conduct specific activities, proposed pro- 
cedures in da^ta collection, and reporting plans. An evaluation de- 
sign is not useful unless it can actually be implemented. 

18 - 



-ifRITERIA GONCERNING THE ADEQUACY OF THE COLLECTION AND • ^ 
, ^PROCESSING OF INFORMATION . \ . 

A. Replicability ,- The evaluation de-ngn should ino;]^de procedures 
for assuring that the information being collected is accurate . 

and that if the evaluation were replicated the same results would 
occur. Statistical reliability indices, should* be provided for 
standardized instruments, and procedures for determining the 
relic^bility of information collected by nonstandardized instru^- 
ments should be included in the evaluation design. The ^.viewer 
should check the design to see whether such information Js pro- 
vided and circle the appropriate respoase. If th^e design pro- 
vides no way to cheek the atcurac^ or replicability of infor- 
mation being collected, those concerns should be described, 

B. Objectivity , The evaluation design should incorporate proce- 
dures tc control for biases r those biases that may affect an 

' evaluator's collection or interpretation of j'nformation should be 
clearly labeled and minimized. Methods for maintaining fairness 
and objecti vity--such as the use of external data* collectors, ob- 
jective, and unbiased instrumentation, or interpretation panels for 
reporting f indings--should be incorporated into an evaluation de- ^ 
sign whenever possible. If the reviewer has concerns about inher- 
ent bias in the evaluation design, those concerns should he noted 
and discussed wit^. the evaluator, 

C. Representativeness . The information to be collected should accurate- 
ly represent the program or project being evaluated. Data collec- 
tion instruments should be valid, and they should obtain informatioi^ 
that bear upon all the evaluation questions. Information about all 

significant aspects of the program should be reported. Sampling pro- 

17 



cedures are often used when the amount of information needed for a 

...^ complete picture becomes too unweildy- When thh is done, represent 

tativeo samples should be selected, ' y . ^ - 

* . \ 

CRITERIA CONCERNING THE ADEQUACY OF THE PRESENTATION \ 
AND REPORTING OF INFORMATION ^ ^ - \ 

A. Timeliness , The evaluation design should describe how reports and 
..Other presentations,fit into the schedule for decision making. Re-- 

port deadlines should reflect the informational needs of- the persons 
to whom the presentations are directed. The design should contain a 
reporting schedule and content descriptions of reports or other , ?. 
. presentations, and show the relationship to the decision-making sche-- 

dule. - ■ • ' - 

B. Pervasiveness . The evaluation design should call for the delivery of 
reports or presentatioris to alT relevant audiences. These include any 
persons or groups' that affect or are affected by- the evaluation itself 

or the object of the evaluation. Suggestions about the distribution ol 
evaluation -information shou-ld'te recorded under "Elaboration." 

e 

GENERAL CRITERIA ' 
'A. Ethical Considerations . The evaluation design should cover whatever 
' ethical considerations may be of concern. In some cases certain in- 
formation obtained through the evaluation may be confidential, and steps 

i. 

to protect confidentiality should be included in the design. An eval- 
uator should al5o be aware that some data collection procedures— such as 
use of peer informers— may be threatening to subjects and such practices 
should be avoided. Additional ethical considerations not addressed 
/ within the design, should be noted under "Elaboration." 

' 20 ' 18 



/ 

B; ProtocoT . The evaluation design should include some consideration 
of protocol. For example, it is often necessary to obtain a super- 
, intendent's- permission to talk to a building principa.1 or teacher 
before actually contacting that person.' In many cases, it is pro- 
fessional courtesy to request permission to use the work of others 
bef eye. referencing it. In all phases of information collection and 
reporting, strict protocol should be observed.^ 

Summarizinq the Information Contained in the Check-list 

After considering each question on the checklist, a reviewer will 
have a series of circled responses in one column and a number of com- 
ments in the other. "No" or "?" responses indicate a need for additional 
information. Comments in the "Elab.oration" section will provide a basis 
for making various sorts of improv^^ments in the design. In short, the 
information from the checklist summarizes for the evaluator what changes 
are needed to make the evaluation design acceptable. 

Whenever evaluation is conducted under contract, the evaluation de- 
sign' becomes an important foctis of communication among the evaluator, his 
staff, and the client. Modifying the design to make it acceptable to 
both sides can aid that comnuni cation process. Should irreconcilable 
differences arise between evaluator and client, one alternative is to 
terminate the relationship; another is to bring in an object.ivi- outsider 
to negotiate cirafiges. In most cases, however, differences can be resolved 
through design modification. 

The following sectiWof the^paper provides a sample application of 
the checklist; that sample application is intende4 to clarify concepts \ 
described in this section. The r^aider Is encouraged to gain experience 



in using the checklist by first applying it to the design, and then com- 
paring his results with those of the authors. 



^ERJC 



22 



• 20 



III. EXAMPLE APPLICATION OF THE CHECKLIST TO AN EVALUATION DESIGN 

- The checklist for judging evaluation designs that is given tn the 
'previous section is.to be used as a tool to help identify strengths 
and weakliesses-in.an evaluation design. Identified weaknesses can then be 
improved bef^Qre the evaluation begins. / 

In this section, the checklist is applied to a fictitious evalua- 
tion design. There are two parts io this section of the paper. The 
first is a short, fictitious evaluajbion design. This design is not 
intended to represent any actual evaluation study in Alaska or elsewhere. 
Any ^-esemblance to an existing evaluation study in Alaska is purely 
coincidental. Rather, the design represents the type of evaluation de- 
signs frequently encountered by project directors and ^other administra- 
tors. The design is neither all good nor all bad. As will be seen,, it^ 
contains some components that are entirely: adequate and others that re- 
quire improvement, - , 

The second part of this section of the paper is the actual appli- 
cation of the checklist. Each question in the checklist is answered for 
the fictitious design, and an explanation^^^of each answer is given. 

i 



23 



21 



^ EVALUATION DESIGK FOR THE HARTMAN READING PROGRAM 
/ FOR FIVE BOROUGHS 



Introduction 

In recent. years reading instruction has become a major .target area 
for education not only in Alaska but throughout the United States, As 
a result of this emphasis, several new reading programs, te)^tbooks, 
ana TnstructionaV^materJ.3i.ls^_h^ve been developed. 

Recently, one of these new programs, the l^TfriraTi^Reading^^ 
was adopted jointly by five Alaskan boroughs: Elk Mountain, Donellyr^^ 
Banks, Karnaska, and Port, The H^rtman Reading Program is appropriate 
for students >n grades one through six. It was selected because it 
had been developed for use in a variety of cultural settings, and because 
it purported to improve the self -concept of students from minority 
cultural groups. The expense involved in adopting the Hartman Blading 
Program was too much to be borne by any one .borough alone, but a joint 
effort made adoption feasible. 

The purpose of this evaluation is to determine whether the Hartman 
Program is fulfilling the goals\.which the five boroughs have set for • 
new reading programs. 

Program Goals and Evaluation' Questions 

Tne five-borough Planning. Committee which selected the Hartman Reading 
Program have established four goals that any new r^eading program within * 
those boroughs is expected to attain. These four goals are listed below 
along with several associated evalCration questions. 

Goal 1: Children in the program will achieve^in all reading subjects 
at a rate coninensurate with their own age, ability, and grade level . 



Question LI: How does the pJerformanpe of children in the new 

( program, as measured on a standard reading achievement 
test, compare to that of other children in the United 
States at the 'same grade level? 

« 

Question L2: How doer the performance of children in the new pro- 
gram, as measured on standard reading achievement tests 
compare to the performance of children ^in^-^the district 
jn past-years? - ' ^ 

Question L3: How does the 'performance of children in the new 

program compare to that^of children in the old reading 
program?. - ' ^ . * 

Goal 2; Children in the new program will demonstrate growth in 

self-esteem and improvement in self-concept. 

Question ^.l: Hov^ do children in the new -program compare with 

children in the old program in measures of self-esteem 
and self-concept?* ' \ * 

Goal 3: ATI teachers ^and staff members of participating classisooms 

will be involved ir. -ci comprehensive ioservice "draining program . 

Question 3,1: What perc^nt^age of teachers and staff members from 
participating classrooms have taken the voluntary . 
training program?^. 

Question 3,2: To what extent dot teachers and staff members express^ 
^ satisfaction with the training program? 

Goal 4: Parents will bie involved in the implementation of the new 

program . t 

Question 4.1; What percentage of parents of ^students in participating 

classrooms become involved in the classroom activities \ 

designed for parents? ^ 

25 • • , 23 - 



Audiences .4 or the- Evaluation - / . / 



The.primafyaudience; for the evaluation is the Planning Commi^;tee 
for the fiye.bd»*oughs. Based upon the results of the evaluation, the 
Plamiing Committee will decide to adopt the Hartman Reading Program 
■throughout the five boroughs, or to ^Viminate use of the program:. 
Tha^t decision w^^l be made in Oaly, ' * ' / 

One secondary audiertca for the evaluation is teachers throughout . 
the boroughs. Data collected during the pretest can be used by teachers 
to diagnos'^e reading difficulties and poor self-concepts by students., - 

Another secondary audience consists^of project directors, eva'luators, 
and other educators 'throughout the state who would like information about 
the Hartman- Rec-ding Program'or about the evaluation procedures used 
in this study, • 

Data Collection Design for the Hartman Reading Trggram 

In order to. allow for classroom differences while making necessary ■ 
comparisons, a pre-post-test, treatment-control group design was developed. 
Students /n the pew program are designated the- treatment or experimental 
group, and. those in the regular school program are considered the control, 
'group. Three alternative methods for gathering comparative data have 
been designed. Each of these designs depends on random assignment of 
students or classrooms to treatment and comparison groups at the begin- 
'ing of the' school year. Jhe alternatives .are listed below in order of - 

24 



desirability. Since more desirable designs may also be more difficult 



to Tmplement, the most desirabVe alternative that can be implemented 

within the constraints imposed by the school situation will be chosen! 

Alternative I: Random Assignment of Students Within Xlassrooms 
- — ^ ^ ' ■ — ^ 

This experimental design allows -for random assignment of students 
t6 program and control groups within classrooms, Thi"s-Kiesign is based 
on th^ assumption that such assignments are. acceptable to teachers, 
and that the two reading programs can be Implemented in^each classroom, 
student Selection Procedure . . • 

1. Determine, by grade and classroom,^ the number of students who 

. would participate in the program, ^ 
""2, Make an alphabetical list, by classroom, . of atudents who may 
/ be" sefected to fill program to capacity, (This list should 
^contain twice the number of students needed to f.iU program 
quota, J " 

.3, Alternately assign names to program.artd control groups in each ^ 
classroom, as follow^: first name on list to p.rogram; second 
Y ' n^me to comparison group; third name to program; fourth 
name to comparison, etc, . ^ 

Alternative IT: Random Assignment of Classes 
The second alternative involves .the random assignment of entire 
classes to treatment and control groups. It assumes 'that several classes 
of students at each grade level can adopt the new program or remain 
with the old one,. 

Classroom Selection Procedure , O 

1, Determine; by grade, the number of students who would participate 
in the new program, - - . . - 



2. Prepare a list, by grade,, of classes which would participate , 

in the "program. Assign a number to each classroom on the list.;,^^ 

3. ^' Use a random number table to select classes t,o participate in 

the treatment group, and choose half of the classes for that 
'purpose, Theremainder will constitute the control group. 
- Alternative III: Teacher Selection of Program 

This alternative allows teachers to choose whether they would like 
to participate in the new program or keep using the old one. ' The selection 
procedure simply involves allowing teachers to chQOse according to their 
preferences. 

The comparison design will be used to> determine the effects of the 
Hartman Reading Program in the areas of reading performance and self— , 
concept. S,tatistical techniques appropriate for the design chosen wiYl" 
be 'used. Comparative analysis of differences in performan-ce on all 
pre-post-tests will be included in the design. The specific question 
answered here is whether children in' the progratii are learning significant-' 
ly more than comparable children not in th^ program. 

\ t. 

Reporting Procedures 

Three types of reports will be prepared— a Teacher Report for each ^ 
teacher, an Administrative Report, and a Technical Report. 

A Teacher Report will be compiled for each teacher's classroom . 
suirmarizjng pretest data for the classrodm. The teacher feedback report 
will include: 

. Tables (two per class) showing scores, percentiles, and stanines 
for each pupil on each test. 



f\ . ____ 

. Tables (two per class) of profiles showing graphically the 
^percentile equivalents of the average score for each test and 
comparison of each child with his class, with children in 
other classes, and with stud'ents at the same grade level in 
Other schools tested. 

> 

. Local norms and standardization as given in administrative 
• feedback report, 

. An interpretive guide for using the data provided. 

The Administrative 'Report will include a summary of the comparison 
study results. The effects^of the Hartman Reading Program in comparison 
with the standard program will Jje summarized and interpretations given. 
The Technical Report will include: 

. Detailed description of data-collecting methods and procedures'. 
. Detailed description of procedures used in data analysis through- 
out the project. 
. Surrmary tables as presented in admini strati ve feedback, 
. Item analysis of all tests used in project. 
. Informs on all tests used in project. 

The Administrative Report and Technical Report will be reviewed by 
a panel of teacher's, administrators. State Department of Education personnel 
and university educators to- determine the accuracy, fairness, and impartial- 
ity of the .reports. Reports will be revised- on the basis of those 
reviews, and, if consensus is not reached, an addendum giving the opposing 
interpretations will be attached. . • 

26 . 

.29 



"TOe sertption of Program and Comparison Treatment 



Prograrr. groups will receive reading instruction as described in 
the.Hartman Reading Program Guide for Instruction . The Guide gives a 
detailed account of materials to be us^-^u, involvement of parents, 

! 

•sequencing of concepts,, and time required for each activity* The Guide 
also provides the philosophical underpinning of the program, general 
program objectives, and settings in which the program should be used/ 
' Because the Guicv> is readily, available, the program description is 
— ^at-repea'ted-4n>-u\i-s~design. The comparison group will receive instruc- 
tion in the usual curriculum offered in the five boro.ughs,\ Because 
the same curriculum is used in each of the boroughs* no further standardi- 
zation of treatment will be required, A detailed description of th"'e 
standard curriculum and its implementation is provided in the Curricu- 
lum Guide. - - 



Testing Instruments 



Tests were chosen. to measure important reading skills being taught 
in the reading progran^s of the boroughs. These skills encompass listen-* 
ing and writing as well, as more typical reading skills. In addition, 
a test of self-esteen] is iricluded. The test^ chosen— the Sequential 
Tests of Educational Progress, the Multicultural Reading Series, and the 

•are .described on the following pages, 
^ks of . Educational Progress (STEP) are achievement- 



Self-Observation Scale 
The Sequential Te 



' oriented tests. These jinstruments measure the broad outcomes of general 
education, focusing on jthe ab!;ility to solve new problems on the basis of 

27 



0 



TfrfomatK)n-l-ear'fl€d-a-s-appjQS£d_to^ "lesson material." 

The STEP instruments provide for ^continuous measurement of skills over 
nearly all of the years of general education; therefore, they measure ,^ 
more of the cumulative effect of instruction. 

The STEP Listening tests were desi.gned to measure a student's abil- • 
ity to understand, interpret, apply and evaluate what he listens to. 
The listening skil.ls are broken down into sub-abilities which, are class- 
ified as follows: plain-sense comprehension, interpretation, evaluation 
and application. , 

the STEP Listening tests include typical examples of what might 
actually be^said to students in a sphool situation. Each test includes 
materials of the following types: direct and simple explanation, exposi- 
tion, narration, argument and persuasion,* and aesthetic material (both 
poetry and prose)- , " * 

These tests are available for grade four to college sophomore level 
They are subdivided into four leveTs of diffi cul ty^ to provide for a wide 
range of abil ities. 

STEP Listening test interpretation begins with a'score which is 
translated into percentiles through the use of normed tables: The pub- 
lisher^also provides national norms from a sample of students* scores 
with those of a nationwide sample of students at the same educational 
level. Directions for constructing local STEP norms are pro^vided. 

The STEP Writing test measures ability to think critically in 
writing, organizing materials, choosing, appropriate materials to write 
effectively, and using appropriate, conventional punctuation and grammar. 

Tihe materials chosen were those from actual student writing excerpted 
from betters, newspapers, answers to test questions, reports, stories, 

28 

31 



notes, outlines, questionnaires and directions, 

■ -The -STSeJto'tlagJbest is bas ed on , the same criteria as the lis'ten- 

ing test. Norms were formulated in the manner described in the listeria 
ing section. 

The tests of reading in the Multicultural Reading Series are design- 
ed to measure both vocabulary and comprehension... At grade levels beyond 
primary one, comprehension is measured by two subtests: sp eed of compre- 
hension, and level of comprehension/ 

' Scores on the tests of reading may be used not only as measures of 
achievement in reading itself, but also as bases for estipiating ability 
to achieve.. In grouping children and adjusting instruction to itidividual 
differences, a measure of reading ability is often useful as a measure of 
mental ability; After a child has learned to read-, the use of both ^ \-^' 
measures i.s much better than the use of either one .alone. 

The test was constructed by the Testing Research Associates (1962) 
especially for multicultural student populations. Administration time 
varies from 30 to 50 minutes. Given specific instructions, a teacher 
may administer the test successfully. 

The technical report of the series presents an average parallel 
test reliability of .87 and an average correlation of -78 with the STEP; 
this indicates a relatively high concurrent validity. 

The Self-Observation Scales (SOS) is a direct, self-report, group- 
administered instrument comprising 45 items (Forms A and B)' designed to 
measure five dimensions of children's affective behavior: self-accep- 
tance, social maturity, school affiliation, self-security and achieve- 
ment motivation. The SOS has been translated into various languages 
including Spanish, Italian, Chinese, Greek, Korean, Japanese, Tagalog 



and Arabic. 

The Technical Bulletin (No. 1) for the SOS reports the following 
split-half reliability values (N=4144): 



SeM-- -Soc-ial — ,— -:Schoo-l. Se.l.f=- Acb-i.av.emeat- 



Acceptance Maturity Affiliation Security - Motivation 
Form A .75 • .77 .,- .76 .81 Not 'Available (NA) 

Fomi B , .79 '.79 " .79 ' .81 NA 

^ Intersubscale correlations are reported as follows (N=4144): 



Self- 

Acc^eptance 

Social 
Maturity 



Self- Social School . Self Achievement 

Acceptance Maturity Affil i at ion Security Motivation 

- . .06 .48 .18 NA 

■4 

^ .34 ' .58 NA 



School 

AffiliatiorT " ' ^ 

Self- 

Security " " ^^"^ 



,35" -NA- 
NA 



Content validity is assured by publishers at the Institute for De- 
velopment of Educational Auditing. 

/' 

The validation and norming sample includes students'from 150 schools 
nationwide* In drawing the sample, particular attention was paid 'to * 
the social, geographic, and socioeconomic characteristics of the parti- 
cipating schools. The norm* ^roup was composed of 9,030 students at 

K - 3 levels. < 

The validation and norming sample includes students from 150 

schOQls. The norm group was composed of 9,030 students at K-3 levels. 

^ According to the publishers^ '*The SOS differs from other similar 
instruments in (a) the extensiv.e validation study which has accompanied 
the nation^il norming effort, (b) the emphasis on the healthy and posi- 
tive, rather than pathological and negative dimensions of children's 
affective behavior, and (c) the practical decision-making orientation 
rather than a research, theoretical orientation.'* 30 



other Data- Col lection Forms ,^ * ' : . 

Data about the paVticipation of teachers and staff members in 
insecv^'ce programs will be collected from the records of inservice 
instructors. The satisfaction of teachers with the training will be 
measured using the Training Satisfaction Questionaire (TSQ), The TSQ 
has been used frequently in the boroughs. It consists of 20 questions 
about the training, and has adequate reliability (KR-20 coefficient = 
.83) for this type of questionnaire. 

Participation of parents in classroom activities will be determined 
using a form to be filled out by teachers and a questionnaire to be 
sent to parents. Information from these two instruments will be cross'' 
checked and discrepancies resolved by the evaluation team with follow- 
-up correspondence. 

Procedure Clearance Steps 

■ ■ ■ -r — » ' " ■ 

All data collection activities, teacher training workshops, evaluatio 
questionnaires, and niass communication strategies will be submitted to 

the chief school officer in each Iporough for approval* prior to use. 

] 

Procedures for implementing any evaluation. plans will be determined 
jointly with the chief school officer. 

Evaluation Activities Time Line « 

September 

Select treatment and control groups 

Request student names and identification numbers 

Deliver test materials to schools 

Conduct pretest evaluation inservice 

34 



October 

Submit completed student LD» blanks to evaluation unit 
Administer pretests 

Pick up completed pretests from schools 
Visit schools evaluation team 

i 

November 

Administer listening tests. 
Mail student inforrnation blank to schools 
Complete and deliver individual Teacher Reports 

i 

December \ 

Begin class observation, schedule . - , 

Submit completed student information blank to evaluation unit 

Classroorfi observation schedule (ongoing) 

January 

Monitor experimental/comparison groups and continue classroom 
observations 

Conduct evaluation conference for parents/advisory council 
members 

Classroom observation (ongoing) 
February 

Participate in visits to schools 
Classroom observation (ongoing) . 



March 



Continue classroom observations and monitoring of experimental/ 
comparison groups 

Contirtue participation in visits to schools 

32^ 

• 35 



Classroom observation (ongoing) - • ^ ^, , ^ 

April ' ' / 

Mail parent/ teacher/administrator questionnaires 
Conduct posttest inservice 
Classroom observation (ongoing) 

Questionnaires due in the evaltiation unit by the end of the 
month 

''Deliver posttest materials to schools 
Posttest administration 
Completed posttests to be picked up 

June ' • \ • ' 

« 

Technical Report and Administrative 'Report completed 

July 

Use of reports for adoption or elimination of the use of the 
Hartman Reading Program 

IK 




\ 



- 36 



Use of the Checklist with the Fictitious Evaluation Design 



In this section the fictitious evaluation design' is reviewed to 
demonstrate the use of the checklist in determining the adequacy of an 
evaluation design. The rationale for each response is provided immedi- 
ately following each set of questions on the checklist. These elabora- 
tions are somewhat longer than would -be prov-ided by most us.ers of the 



checklist. 



I. Regarding the Adequacy of the 
Evaluation Conceptualization ^ 



A. Scope : Does the range of 

information to be provided include 
all the significant aspects of 
the program or product being ^ 
evaluated?^ 

1. Is a description of the 
program or product presented 

' " (e.g., philosophy, content, 
objectives, procedures, 
-setting)? 

2. Are the intended outcomes of 
the progranj or product 
specified, and does -the 
evaluation address them? 

3. Are any likely unintended 
effects from the program or 
product considered? 

4. Is cost information about 
the program or product 
included? 



Yes) ' No ? 



NA 




Yes No ? 



Yes (No) ? 
Yes (no) ^ ? 



NA 

NA 
NA 



The criterion of Scope seems to be only partially met in the design. 
'The first two of the four questions can be answered with a "y^s/* The 
design does include a description of the Hartman Reading Program, although 
it is done by referencing the Guide for Instruction for the program, 
(see page 7) . ' 



Note that here, as will always occur with the use of any check- 
list, the user's professional judgment must guide decisions about how 
well questions have been answered and criteria met. Some users may wish 
the program description from the Guide -to be included, "in the design as 
lin appendix or in' the test itself ^before a "Yes" is circled. This is 
certainly justified. The important point is that provision be made to 
give an adequate description of .the program to those who need it. 



Also, the objectives of the program are given through a series 
of questions that relate the general goals of the planning Committee 
(see pages Tand 2). 

^ the last two questions the evaluation design does not fare as 

well. No provision is made for any unintended effects that might occur 

from the use of the program. Neither is any information given about 

the cost of the program. In order-for^he criterion of Scope to' be 

adequately met, -the two types ofltiissing information shduld be included^ 

B. Relevance : Does- the information 
to be provided adequately serve 
the evaluation needs of the 
intended audieoces? 
1 . Are 'the'' audiences ,for the 

evaluation identi^fied? 
2.. Are the objectives of the* 

evaluation explained? 

3. Are tftex objectives of the , 
evaluation cot^gruent with ^ 

" the ipformat^on needs of 
the intended au4i^nces? 

4. does the information to' be 
provided-callow necessary 

..decisions about the program 
'or product to b6/ma'de?^ , 

^ ^Xhe evaluation design has adequately met the^.criterion of Relevance. 

Primary^and secondary audierjces wei^ identified (page 3). The objec- 

tives of the 

ed from -the 

sions about .the program. pan be made on the basis of the answers to the 
evaluation questions. » / 



(D 


No 




NA 




No 


? ' 


NA 


(D 


No 


? 


NA 


(D 


No 




NA 











the evaluation wera^/llel ineated in^ a se$ of ^uest^ns^ that fojlow 
he information needs of the primary audience. Further, decij 



C. Flexibil it y; Doesr the eva^luation 
study allow for hew information 
needs to be met as they aris.e?, 

1. Can the design be adapted 
easily to accommodate n^ew^, • 
needs? , ^. • 

2. Are. known constraints onj the 
, evaluation discussed. 

^ 3. Can useful infonriation be 
obtained in the face of ^ 
unforeseen constraints, <.e.*g.» 
' noncooperation of control - - 



groups I 



38^ 



(Yes) No • ? 

Yes (n^^' ^? 
© "NO- J\ 



NA 

NA 
NA 



' 35 



The evaluation design seems to be reasonably successful regarding 
the criterion of Flexibility, It seems that the proposed ^ 
evaluation JwouTd.be able to accompdate new information needs because 
several data collection-procedures and instruments are to be employed. 
In general, an evaluation that uses several procedures is more flexi- 
ble than an evaluation that relies heavily on one or two methods or 
instruments, Anothe/ strength of the design is that there is a set 
of alternatives for gathering comparative 4ata, Selection of groups for 
a comparison study is typically an area in which iome. f lexibil ity is 
needed. 

a/ weakness regarding the Flexibility criterion is that there is 
no diicuss:ion of the constraints on the study. Nearly all evaluation 
studi/es are siibject to constrain^ts of vpirious degrees of importance, • 
and they should be explained -in the desigh. 



D» Feasibility : Can the evaluatipn 
be carried out as planned? 

1. Are the evaluation resources 
(time, money and manpower) 
adequate to carry out the 

\ projected activities? 

2. Are management plans 
specified for conducting 
evaluation? 

3. Has adequate planning^ beeq 
done to support the feasi- 
bility of particularly 
difficult activities? 



Yes No 



Yes 
Yes No 



© 



NA 

NA 
NA 



The adequacy of the evaluation design as it relates to the Feasi- 
bility criterion is in question. The available resources to conduct-- 

- i . - 

the study are not given, and so no -judgment can be nfade about their 

adequacy.- There is'iio management plan which lists the major tasks, 

time required to complete tasks, -oy' personnel. Also, there is only_ ^ 



39 



.36' 



little evidence that particularly difficult tasks are feasible, Clearl>» 
more information- relating to the feasibility of the study is needed. 



IL Criteria Concerning the Adequacy of 
the Collection and processing of 
Information 



A. Reliability f 'Is 'the inforntatio'n 
to be collected in a manner .such 
th'at findings are. replicable?y 
1.. Are data col lectipn" procer-/ 



iduces desCr*,ibe^ we VI though,* 
be^followed b/'othersr. , >. 



to 



(ve^ No 



NA 



^esj No ? 
(Yes) No / i? 



NA \ 
NA 



2. Are scorifi'g or coding , 
• ■ , procedures .object! ve? \. 

■•3. Are the evaluation . 
insiruments reliablet' 

' . ■ ? • ' ,\ 

Adequate information sjfqporting "the Replicabllity criterion seems 

to be included, Jhe tests^and questionna-ires to be U5.gd in the study 

•are described in adequate detail, and their reliability is shown-. to 

* • ' ' >• . 

. ' * *, 

be sufficiently high (pages 7ff). In the one instance where low relia.- 

bility of data may occur— teacher and parent reports of parent involve- 

ment— the data are to be cy^oss checked (page 11), 



3. Objectivity : Have/attempts been 
made tO; control for bias in data 
^ collection and processing?^ 

U Are sources of information ^ 

clearly specified, 
2. Are possible biases on the - 
* part of data collectors 
^ adequately controlled? ' 



(Ves) No 
Yes No 



NA 



The Objectivity criterion seems to have been met. It is clear 
from whom each type. of dat^ will be collected, Furlther, there do not 
seem to be a/y parHcular threats to the objectivity of the'data, and 
"so no speciaM controls\are required. Hence, the ''NA'* for the second 
question, ^ 



40 



37 



C. Representa.tiveness : Do *the 
information collection and 
processing procedures ensure that 
the results accurately portray^ 
the program or product? 
- 1. Are the data coTlection 
instruments valid? 

2. Are the data collection 
instruments appropriate for 
the purposes of this 
evaluation? 

3. Does the evaluation design ' 
, adequately addressr the 

cjuestipiis_i..t_was.. intended . 

to answer? 



Yes -No 
Ye^s ^No * 

Yes^' No 



NA 
NA 

NA 



The Representativeness criterion has not been ipet satisfactorily 
in this design. The inadequacies with respect to this criterion are 
brought to light by the first two questions. First, the validity of . 
the achievement tests is open to question. No information about the 
validity of the Sequential Test of Educational Progress is provided, 
although such information may well be available. Some Validity inform- 
'atian is given /for the Multicultural Reading Series (page 9 ). For the 
Self Observation Scale , only an ambiguous statement about validity is 
given (page :10). 



-V 



*III. Criteria Concerning, the Adequacy 

of the Presentation and Reporting of 
Infoi^ation " 

. * ^ L ■ ■ 




<^ A. Timeliness: Is the infonnation 
prpvided timely enough to be of 
use. to the audiences for the • 
. \ evalxi|ition? 
' -^l. Does the .time scheclule for 
reporting meet the ne'gds of 
the audiences? 
Z. Is the ^eoorting schedule 
shown to 'be appropriate for 
tfle SQTiedule of decisions? 


^ No X NA 
^ No ?• NA 



The evaluation design clearly meets the criterion of Timeliness. 
The needs of the audiences were taken into account and a reporting 

V 

schedule was developed consistent with those needs (page 13), 

8. Pervasiveness; Is inforjnation to 

be provided to all who need it. 

1. Is information to be 
disseminated to all 
intended audiences? 

2; Are attempts being^made to 
make the evaluation infor- 
mation available to relevant 
audiences beyond those direct! 
aTf ec"? eci'^By'Xhe Tva 1 ua t i on . 

The Pervasiveness criterion is met partly in that the intended 
audiences for the evaluation are to receive adequate information. However 
there are possible unintended audiences that have been largely .ignored. 
The only report to be made available on a broad scale is the Technical 
Report. Other people who might benefit from informat;ion 
from th^ evaluation should be considered, and an appropriate report 
shoulp be V/ritten for them. For example, a general summary of the major 
effects of the Hartman Reading Program would probably "^e useful infor- 
mation for many superintendents and principals to have. 

t 



IV, General Criteria 




A, Ethical Considerations: Does 








the intended evaluation study 








strictly ^fo'l low accepted ethical 








' standards? 








^ 1. Do test administration 




No • ? 


NA . 


procedures follow 








professional standards 








of ethics? 








Z. Have orotection of human 


Yes 


? 


NA 


subjects guidelines been 








followed? ^ ^ 








3. Has confidentiality of data 




(no) ? 


NA 


, been guaranteed? 


|Yes 







(ves) No ? NA 



Yes No ; ?. ; 



39 



The criterion of Ethical Considerations' does not seem to have 
been completely met. There is nothing to suggest that the evaluator 
will engage in any un'ethjcal conduct, but neither is there^ information 
to' suggest that the evaluator has considered all of the ethical prob- 
lems that can arise d'iring an evaluation study. 

« 

One way in which the" evaluator has been responsive to potential 
ethical problems is by requiring that evaluation reports will be ap- 
proved by a panel of educators before release (page 6 )• This panel 
win provide guidance on several ethical issues. ^However, the evaluator 
has-not considered the two other-Issues treated by this criterion. 
The "evaluator should prov.ide evidence that he intends to comply with 
protection of human subjects guidelines as applicable in the study. 
Also, the evaluator should guarantee that the data collected during the 

study will not be released to unauthorized personnel or be used inap-- • 

» 

propriately. 



B. Protocol : 'Are appropriate 
protocol steps planned? 

1. Are appropriate persons 
contacted -in the appropriate 
sequence? 

2. Are Department policies and 
procedures to be followed. 



:Yes) No . ? NA 
Yes) No ? NA 



The evaluator has given adequate consideration to Protocol cri- 
terion in theM^ign. In this case, the evaluator plans to clear 
virtually everything through the chief school officers (page n). AT 
though more specific protocol steps will evolve during the evalua- 
tion study, the^valuator has set a procedure to meet initial proto- 
col needs. i 



\ 



As was noted earlier, the fictitious evaluation design of the 
Hartman Reading Program" is neither all good nor' all bad. The design 
has both strengths and weaknesses, and use of the checklist has hejped^ 
identify them. However, simply using the checkl isFl^ not enough,^ In- 
formation about the evaluation design from the checklist should be 
provided to the evaluator so that weaknesses in the design can be dis- 
cussed and corrected before the evaluation begins. By so doing, an 
important step toward producing a helpful evaluation study will have 
been taken. 



IV- A REVIEW OF PREVIOUS WORK AS A BASIS FOR OETERMINtflG 
THE ADEQUACY OF AN EVALUATION DESIGN 

Most educators who have ever been iovolved in evaluation have worried 
about determining the quality of tlie.e.valuation eff-ort- Although' implTci'f' ^ 
^standards have long been used in determining the quality of evaluation 
plans, evaluation specialists have only recently begun to develop an explicit, 
well defined basis for determining the adequacy of such designs, 

Michael Scriven (1969) first coined the term "meta-evaluation" to refer 

I- 

to the evaluation of evaluation. Since then, several evaluators have proposed 
standards for .determining the quality of evaluation designs. 

Many specialists' proposed standards have evolved from their training 
backgrounds or from definitions of evaluation that they have adopted. 
Consideration of such proposals can help one understand the evolution of the 
checklist offered in the previous section. Because of the considerable effort 
that has recently gone into the development of a basis for evaluating evaluation 
designs, it is important to draw as much usable information as possible from 
these efforts. / 

Bases for judging evaluation designs have generally been presented in one 
of ,three ways: (1) as guidelines that provide a format for evaluation designs, 
(2) as essays describing elements of a good evaluation, or (3) as checklists 
that guide the application of standards to evaluation designs. Examples of each 
are included in this section. 

Guidelines for Evaluation Designs 

Worthen and Sanders (1973) suggested the following format for evaluation 
designs, a set of elements that could be considered to all evaluation designs. 



45 



42 



SUGGESTED FORMAT FOR EVALUATION PROPOSALS 



I. -Rationale (Why is this evaluation being done?) 



il. Objectives of the Evaluation Study 

A. What will be the product(s) of the evaluation study? 

B. What audiences will be served by, the evaluation study? 



III. Description of the Program Being Evaluated 



Philosophy behind the program 
Content of the program 

Objectives of the program, implicit and explicit 
Program procedures (e.g., strategies, media) 
Students 

Community (federal, state, local) and instructional context 
of program 



IV. Evaluation Design 

A. Constraints on evaluation design 

B. General organizational plan (or model for program evaluation) 

C. Evaluative questions 

0. Information required to answer the. questions 

E. Sources of information; methods for collecting information 

F. Data collection schedule 

G. Techniques .for analysis of collected information 
H* Standards; bases for judging quality 

L Reporting procedures 

J. Proposed budget 



v. Description, of Final Report 

A. Outline of report(s) to be produced by evaluator- 
8. Usefulness of the products of the study 

C* Conscious biases of evaluator that may be inadvertently injected 
into the final report 



^ Worthen, B. R. and Sanders, J. R. Educational Evaluation: Theory 
and Practice. Worthington, Ohio: Charles A* Jones, 1973. p 301. 



43 

46 



r 


• 4 




• 


/ 


* 




A similar format was suggested by Stake (1969) in the following guide 




for a final evaluation report: ^ 




r^-' ' '-Sectron"^ *!' - - 


- -Gb-jeet'i ves -of the-^Eval uation. ^ J ^^^^ _ 






A. Audiences to be served by the evaluation 

B. Decisions about the program, anticipated 

C. Rationale, bias of evaluators 




Section 11 


- opeci Ticati on ot tne rrogram 


• 


- 


A. Educational philosophy behind the program 

B. Subiect matter 

C. Learning objectives, staff aims 

D. Instructional procedures, tactics, media 

E. Students 

F. Instructional and community. setting 

G. Standards, bases for judging quality ' ^ 










^pr"hi nn TIT 




• 




A. Opportu'nities, experiences provided 

B. Student gains and losses 

C. Side effects and bonuses 
D- Costs of all kinds 


- 


Section .IV 


^ Relationships and Indicators ^ 






A. Congruences, real and intended 

B. Contingencies, causes and effects 

C. Trend lines, indicators, comparisons 




Section V 


- Judgments of Worth 






A. Value of outcomes 

B. Relevance of objectives^to needs 

r ll<;pf 111 np<;^ nf pvaluti nn information Gathered 




c 


/ 




4 stake, R. E. 
analysis of data. 


Eval-uation design, instrumentation, data collection, and 
Educational Evaluation. Columbus"; Ohio: State Superin- 




tendent of Public 


Instruction, 1969. 


• 


44 ' ' 


^ERLC 




. 47 



Essays About Evaluation Quality 

Essays on educational evaluation offer general statements about the 
elements of good evaluation, and provide a second source of standards • One 
§urh"ei5say', by Worthen 41973)-^ "A Look^^at thajo^aj^o.f , Educational Evaluation 
and Accountability," covered the following considerations: 



1. Conceptual Clarity 

Conceptual clarity is an essential feature of any good evaluation 
plan. By "conceptual clarity" I refer to the evaluator's 
exhibiting a clear understanding of the pcirtlcular evaluation 
he is proposing. Is he planning a formative or summative 
evaluation? Is it a comparative' evaluation design or a single 
program evaluation? Is the evaluation to be goal-directed, 
with the design built around the measurement of attainment of 
specific objectives, or goal -free with the design built around 
lists of evaluative questions generated independently of the 
goals? Answers to. these questions should be apparent in any 
good evaluation plan; for without clarity on these points, 
proper evaluation could occur only by chance, 

2. Characterization of Program 

No evaluation js complete without a thorough, detailed 
description of the program or phenomenon being evaluated. 
Without such characterization, judgments may be drawn about 
a program which never really existed; For example, the concept 
of team teaching has fared poorly in several evaluations, 
resulting in a general impression that team teaching is 
ineffective. Closer inspection shows that the methods 
frequently labeled "team teaching" provide almost no real , 
opportunities for staffs to plan together or work together 
in direct instruction. Obviously, a better description of the 
phenomenon would have avoided these misinterpretations completely. 
One simply cannot evaluate adequately that which he cannot 
, describe accurately, 

3. Recognition and Representation of Legitimate Audiences 

Any evaluation will be adequate only to the extent to which 
it provides for obtaining input from and reporting to all 
legitimate evaluation audiences. An evaluation of a school 
program which answers only the questions of the school staff 
and ignores questions of parents, children and community groups 
is inadequate. Each legitimate audience must be identified and^ 
the objectives or evaluative questions of that audience 
considered in designing a plan for data collection. Obviously, 



'45 

- 48 



some audiences will be more significant than others 
and some weighting of their input might be necessary. 
Corr^espondingly, the evaluation plan should provide for 
receipt of appropriate "e^valUati'onTnformatton by each 
■audience which has a potential interest in the program. 

Sensitivity to Political Problems in Evaluation ^ 

Many a good evaluation, unimpeachable in all of^'ts technical 
details,, has failed because of its political naivete. It 
is pointless to promise to collect sensitive data--e.g., 
principals' ratings of teachers—without first obtaining 
permission from the office or individual who contrdls 
tfhose data. Procedures governing access to data and data 
sources, and safeguards ag&inst misuse of evaluation data 
must be agreed upon early in the ^project. Steps must be 
taken to guarantee that program staff have opportunities 
to correct factual errors in evaluation reports without 
compromising the evaluation itself. These issues exist in ' 
almost every evaluation arid the more explicitly they are dealt 
with, the more likely the evaluation is to survive political 
pressures. 

Specification of Information Needs and Sources 

Good evaluators tend^ to develop and follow a blueprint which 
tells them precisely what information they must collect 
and through what sources that information is available. At 
the very least, they know how (as Scriven puts it) to lay 
snares at critical points in the game trails. Conversely, 
the nov>ce evaluator goes about randomly turning over stones 
or beating the brush to see what he can find. No evaluation 
can depend on a random, scattered "here a little, there a 
little" approach to collecting data. An adequate evaluation 
plan specifies at the outset the information which must be 
collected. If the evaluation is goal-directed, the plan will 

.specify information that will help to determine whether the 
objectives were attained. If the evaluation is built around 
evaluative questions (of the "What would you need to know to 

'decide whether the program was a success or a failure?" 
variety), tfie evaluation plan should specify information which, 
when collected, will ansy/er those questions. And in every 
case, specifying needed information leads logically to 
identification of the sources from which that Information 
can be obtained. Failure to attend to these s.eemingly^ 
pedestrian but truly critical steps is one of the greatest 

■ single reasons that many evaluations produce little useful 
information. 



Comprehensiveness/Inclusiveness 

This category H really an elaboration of the previous 
one. No evaluation can hope to collect all of the relevant 
^ata-- nor would it be desirable to do so, since there 
will always be inconsequential and trivial data not worth , 
the bother to collect. Collecting too much data is seldom 
the concern, however. The greater probleip is collecting 
enough data--or more precisely, collecting data on enough 
important variables to be certain one has included in the 
^evaluation all the major considerations whijch are relevant* 
A good evaluation* includes all of the main effects,- but also 
includes provisions for remaining alert to unanticipated, 
side effects, A good comparative evaluation doesn't stop 
with comparing the experimental arithmetic program with a^ 
control group which receives no arithmetic i list ruction. 
It goes on to identify the critical competitorsr-SMSG math, 
Cuisennaire RodSj-and so forti— and compares their new 
program with those for which cdsts are roughly comparable. 
In short, the weak evaluation is almost always characterized : 
by a narrow range of variables and omission of several 
important variables. The wider the^ range and the more 
important the variables included in the evaluation, the better 
it generally is, * 

Technical Adequacy 

More evaluations founder on this shoal than on -almost any . 
other,, and this is due to the scarcity of educational evalua- 
tors who are even marginally competent in technical areas. 
Good evaluations are dependent on construction or selection 
of adequate inltruments, the development of Adequate sampling 
plansv and the correct chojce and application of techniques 
for data reduction and analysis. Volumes have been written 
on educational measurement, sampling, and stati.stics and it 
would be pointless to try to review that knowledge here* 
Suffice it to say'that competence in these areas is essential 
to most ^evaluations. Without knowledge and control of these 
tools of his trade, the evaluator has little hope of producing 
eval4:^ation information which meets scientific? criteria of 
validity, reliability and objectivity^. 



Consideration of Program Costs 

Educators are not econometricians and should not be expected 
to be skilled in identifying all the financial, human or time- 
costs associated with programs they operate* That bit of 
leniency cannot be extended to the evaluator, however, for 
it is his job to bring these factors to the attention of 
teachers and administrators who are responsible for the programs 
Educators are often faulted for choosing the more expensive 



50 



47 



of two equal 
expensive on 
more widely 
evaluations 
cost factors 

' insightful 4 
and it is im 

^ill accompl 
i? gaining o 
vary in both 



ly effective programs, just because the 
e is^ packaged more attractively or has been 
advertised. The real fault lies with the 
of those programs which fail to focus on 
as. well as on other variables. As any 
dministrator knows, costs are nOt irrelevant, 
portant for him to know how much program X 
ish and at what cost so'^he may know what he 
r giving up in looking at other options which 
cost and effectiveness. 



9. Explicit Standards/Criteria 

It is always a bit disconcerting to^me to read through an 
evaluation report and be -unable to find anywhere a s^tate- 
ment of the criteria or standards which were used 'to 
determine the program's succes^s or failure. The measure- 
ments and observations taken in an evaluation cannot be 
translated into -j^udgments of worth without standards or 
^iteria; Is an in-service program for teachers successful 
i'f'>75^ of the teachers attend 75% of the meetings? That 
all depends on the standard that is set for the* program. 
What ^about a 60% attendance rate in a high school English 
'•lass— is that good or bad? Again it depends on the ^ 
standard. If it is a regular English class, with a standard 
of\95%, 60% looks pretty bad. But in an English class for 
rehatilitated dropouts who work pc(rt-time tOQ support their 
parents, the standard might be 50% and the attendance rate 
of 60X\might be quite acceptable. Every good evaluation 
will include a statement of standards and criteria.. 



10. Judgments airid/or Recommendations . 

The only*reason for insisting on explicit standards or criteria 
is that they are the stuff of which judgments .and recommenda- 
tions are made, and these judgments and recommendations are 
the sine qua non of ^evaluation. An evaluator's responsibility 
does not efd with the collection, analysis, and reporting of , 
data. The data do not speak for themselves. The evaluator 
who knows those data well is in the best position to apply 
standards for judging effectiveness., Making judgments and 
recommendations is an essential part of the evaluator's job. 
An evaluation without judgments is as much an indictment of 
^ its author's sophistication as one with recommendations that 
are not based on the data, ^ ' 




11 . Reports Tailored to Audiences 



) 



I argued, a few minutes ago that there are multiple audiences 
for most evaluations and these audiences have different 
informational , needs. For example, when you complete an 
evaluation, your _col leagues evaluation will.be interested 
in a complete, detailed report o^ your data callection ^ 
procedures, analysis techhiques, and the like. Not so 
for the school -t^oard, or the PTA or the little old lady in 
tennis sneakers who headsL the local taxpayer group. These ^. L 
audiences do not share the eva/luator*s grasp of technical * 
details or his interest in tefst reliability and validity or 
the appropriate choice of an error team in a'candomized 
blocks design. The evaluator will havfi to tailor reports^ 4 i:^!, ' 
for these groups' so that theydepend on^ non-technical langua^ei 
and he must avoid over-use of, tabular presentation of data^ * 
analyses. A typical evaluation might produce one omnibus - 
techn-ical 'evaluation report which self-consciously includes 
an the d'etails and one or more non-technical evaluation 
report(s) aimed at the important audience(s). 



Another notion should be inserted here as well— that^of 
Jnterim or even continual reportTng of eva iuation ^"ndings. 
"Timeliness is an important concern in eval uati on. information 
that is presented too late to" affect the cjecision for which 
it is relevant is useless. Good evaluations wilVnot depend' 
solely on the prij[ited word, "bid: will includa a variety of 
report formats-r-inclodiog* "hot-line" telephone reporting— so 
the information is reported whenever it is needed to make a 
particular decision. , * ' * 

Ot[ier general standards which have been widely used include the following, 
developed by Stufflebeam et al- (1971)^: 

1. Internal validity . Does the evaluation design provide 
the information it* is^tn tended to provide? The results 
of the evaluation study should present an accurate and 
unequivocal representation of the object being evaluated, 

2. External validity. To what ej<tent are /the results of the 
study general izable ^across time, geographical environment 
and human involvement? In many small evaluation studies, 
the concept of e)j:ternal validity is irrelevant since the 
evaluator is interested in collecting aff?d interpreting 
information about one specific programjafe^one point in time. 
However, the* concept may be quitg important in large-scale 
evaluation studies where sampling is used and findings | 
must be generalized back to the total population. 



^ stufflebeam, D. L. et al_. Educational Evaluation and Decision-Making 
in Education . Itasca, iTTinois: Peacock, 1971. 

ERIC ^ . 



Re liability , How accurate and consistent is the 
Tnformation that is collected? The evaluatpr should • 
be quite concerned 'about 'th^ adequacy of his measures 

since his results can* only be ai good a's the v. 

information on which they -are based. ^ 

Objectivity . How publ ic'is the information co] lected** by ' 
the evaluator? The evaluator should strive to- collect 
information' and make judgments .in such a way that the 
same interpretations and Judgments wouVd be made by any 
rntelligent, rational person evaluating the program. . • 

Relevance . How closely do the data relate *to thV 
objectives of the evaluation study? Defining objectives v 
for an evaluation study enables ^the evaluator to check 
himself on the relevance of his activities'. " - ^ . . 

Importance . Given a set of constraints on the' de'sign of* N 
an evaluation study, what' priorities are placed on the 
information to^be collected or program comppnents to 
be evaluated? 'it is often tempting to study one relevant 
-iLSP£jCj:_oO_Brggram in de pth and t o coll ect much informa- 
tion which may subsequentjy prove to be'^'^^^TnpiortBnt-^t — 
the conclusion of the study than less detailed information 
about another aspect might have been. It is the 
responsibility of the evaluator to set priorities on the 
data to be collected. ^ • • . 

Scope . How comprehensive is ^the.design of the evaluation- 
Ngtudy? There are a wide variety of considerations to ' 
explore, as emphasized in several papers presented in ' 
the previous chapter. The evaluator must consciously avoid 
the possibility of developing "tunnel vision" by taking a ' 
wholistic approach to program evaluation-. ^ . ^ 

Credibility . Is the evaluator believed 'by his audiences? 
Are his audiences predisposed to act on his recommendations? 
The evaluator-cTient relationship is an important one if 
the evaluator wants his efforts to have some fmpacl on the 
program he is evaluating. 

Timeliness / Will evaluation reports be available when- they 
are needed? Many evalutors have missed the cha/ice to 
influence action because they reported too mucl/, too late. 
Wben decisions affecting a' program are being made, any 
reliable information is better than none. Th6 provision 
of tnterim, often informal, reports will help to avoid this 
proj)lem of being too late to influence the decision*. 



I. 0. Pervasiveness . How widely are the results of the 

eva1uat«4on study disseminated? It is true that, in 
many cases, only one audience needs to be addressed. 
However, the ev'iiluator is responsible to provide the 
results of his study to all individuals or groups 
who should know about the results. 

II. Efficiency . What.are the cpst/benpfits of ^the study? 
Have resources been wasted when that waste could have 
been avoided? Operating under the "constraints imposed 
on most evaluation studies, the evaluator is \ ^ 
responsible for making the best possible use of material 
and human resources .avai lable to him.^ 

\ 

Checklists That Guide the Application of Standards to Evaluation Designs 

Checklists which guide the application of standards to evaluation designs 
or reports are a third source of standards.. These checklists cover many general 
concerns; the most useful checklists also include highly specific, comprehensive 
standards which can assis^'n determining the quality and completenes's .of 
evaluation designs. 

Each existing checklist seems unique in form, content and purpose; never- 
theless, many share common characteristics. Generally, checklists for judging 
evaluation design^s include considerations of the scientific or technical 
ajdequacy of the evaluation, the practicality and cost efficiency of the design, 
the usefulness of the data to be collected, and the responsiveness of the 
design to legal and ethical issues. 

Four checklists f6r judging evalujftion designs are described below. The 
first of the checklists, that written^y Stake (1970), contains five general 
areas in which evaluation designs are to be j.udged: ^ (1) the evaluation itself, 

(2) specifications of ^ the program being evaluated, (3) pnogram outcomes, 

• * 

6 

(4) relationships and indicators, and (5) the program's overall worth.- Each 



6 SUKefR. A Checklist for Rating an Evaluation Report, Unpublished 
manuscrrpt, October, 1970. . . ^ 



generaVarea, in turn, covers specific xonsiderations^yhich, v/hen relevant, 
are to be judged on their individual adequacy, ^ ^ . \ 

' . The Qheckl isi by.Bracht' (1973) includes six areas on .which evaluation 
designs should be judged: (1) communication, (2) importance of the evaluation, 
(3} .design for making judgments, (4) design^for obtaining des'criptive data, 
(5) reports, and (6) concerns7 Detailed questions are included within each 
of these sla areas of concern. - • ' ' - . ' ' 

Stufflebeam' s (1974) checklist covers^six aspects of the design: 
(1) conceptualization of the evaluatfon, (2) soc/o-pol itical factdrs, 
(3) contractual/legal arrangements, (4) the technical design, (5) the manage- ^ 

Q 

rrent plan, and (5) moral/ethical/utility questions. Rather than questioning 

the adequacy of certain aspects of evaluation desjgn, Stufflebeam seeks specific 

information that should be included in an evaluationdesign. - - 

The final checklist, compiled by Smith and Murray (1974), includes a 

9 

number of questions from other checklists. Smith and Murray address three 
areas of evaluation design: (1) content descriptions, (2) evaluation 
activities/results, and (3) document characteristics. Each of these major areas 
is further divided into two subareas with appropriate exemplary questions 
designed to determine the adequacy of those subareas. \ 

Guidelines for evaluating school practices provide another source of 

evaluation design standards. Directions for program audits produced by the 

\ 



. ' Bracht, 6. H. , Evaluation of the Evaluation Proposal, Unpublished 
manuscript, 1973. 

? Stufflebeam, D. i.. , An Administrative Checklist for Reviewing Evaluation 
Plans, Unpublished manuscript, April 1974. 

I 

9 Smith, N. L., and Murray, S. J., Evaluation Review Checklist, Unpublished 
manuscript, 1974. 



55 



52 



federal government and directions for evaluation audits produced by 
auditing agencies pontain examples of such criteria. Such guidelines 
are also available from. the National S tudy of School Evaluation (NSSE) . 
/s^ Evaluative Criteria^ *^ for secondary schools ^.^i^dle schools, elementary \ 

' -scliools and multicultural programs. These guidelines, .used by' accredi- 
tation teams throughout the country in evaluating school programs, con- 
tain a Gonfprehensi ve list of school characteristics useful in checking * 
'the completeness of 'a design for evaluating a school program. 



Summary ^ 

The review provided in this section demonstrates the extensiveness 
of the work that has been done by educators in producing criteria for 
judging evaluatrofl designs and reports. Because of this considerable 
effort, the practice of -judging evaluation designs and reports is be- 
coming more and more con^mon among educators who are involved with 
producing or using evaluation studies on a dajly basis. And, while 
there are many differences among the various sets of criteria presented 
in this section, many common threads of thought can be found. The 
criteria presented earl'ier^in this paper reflect those common elements. 



ERIC 



iu Evaluative Criteria (Fourth Edition), National Study of Secondary. 
School NyEvaluation, Washington, D, C. , 1969, 



56 



53 



References 



AS'tin, A. W. , r-^nd Panos, R. J. The evaluation of education programs. 

In R. L. Thorndike (Ed.) Educational measurement . Washington, / 
0. C. : AiT.erican Council on Education, 1971. 

Bracht, G. H. Evaluation of the evaluation proposal . Unpublished 
manuscript, 1974. 

National Study of Secondary School Evaluation. Evaluative Criteria 
, (4th ed.). Washington, D. C: Author, 1969. 

Scriven,^M. An introduction to meta-evaluation. Educational Product 
Report , 1969, 2, 36138. 

' j 

Smith,'^N. L., and Murray, S. J. Evaluation review checklist . Unpub- 
lished manuscript, 1974. 

Stake, R. E. Evaluation design, instrumentation, 'data collection, 
and analysis of data. Educational Evaluation . Columbus, Ohio: 
State Superintendent of Public Instruction, 1969. 

Stake, R: E. A checklist for' rating an evaluation report . Unpublished 
manuscript, 1970. 

Stufflebeam, D. L. , et al. Educational evaluation and decision-making 
In education . ' Itasca, Illinois: Peacock, 197K 

Stufflebeam, D. L. ' An administrative checklist for reviewing evaluation 
plans. Unpublished manuscript, 1974/ 

Worthe'n, B. R., and Sanders, J. R. Educational evaluation: theory and 
practice . Worthington, Ohio: Charles A. Jones, 1973. 

•Worthen, B. R. A look at the mosaic of educational evaluation and 
" accountability. Research,. Evaluation, and Development Paper 

Series. Portland,' Oregon: Northwest Regional Educational . Laboratory. 

Wright, W. J. and Worthen, B. R. Standards and procedures for develop- 
;nent and implementation of an evaluation contract. Alaska Depart- 
ment of Education, 1975. 



\ 



57 



54 



